AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Efficient Deployment with vLLM

# Efficient Deployment with vLLM

Deepseek R1 0528 Quantized.w4a16
MIT
The DeepSeek-R1-0528 model after quantization processing significantly reduces the requirements for GPU memory and disk space by quantizing the weights to the INT4 data type.
Large Language Model Safetensors
D
RedHatAI
126
3
Qwen2.5 VL 32B Instruct FP8 Dynamic
Apache-2.0
An FP8 quantized version based on the Qwen2.5-VL-32B-Instruct model, supporting visual-text input and text output, suitable for efficient inference scenarios.
Image-to-Text Transformers English
Q
BCCard
140
1
Gemma 3 27b It FP8 Dynamic
Apache-2.0
This is a quantized version of google/gemma-3-27b-it. The weights are quantized using the FP8 data type. It is suitable for visual-text input and text output, and can perform inference with efficient deployment using vLLM.
Image-to-Text Transformers English
G
RedHatAI
1,608
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase